Case studies

Dae-Jin Lee < dlee@bcamath.org >

BCAM-UPV/EHU Courses 2018-2019

Example: Air quality data

In this Section, we analyze the data(airquality) (see ?airquality) consisting of air quality measurements in New York, from Maty to September 1973. The data frame contains with 154 observations on 6 variables.

For more details ?airquality

data(airquality)
pairs(airquality)
*Scatterplot matrix*

Scatterplot matrix

A simple scatterplot shows that some of the variables have a non-linear relationship.

Let us consider a linear model for the response variable Ozone

airq.lm <- lm(Ozone ~ Temp + Wind + Solar.R, data=airquality)
summary(airq.lm)
## 
## Call:
## lm(formula = Ozone ~ Temp + Wind + Solar.R, data = airquality)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.485 -14.219  -3.551  10.097  95.619 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -64.34208   23.05472  -2.791  0.00623 ** 
## Temp          1.65209    0.25353   6.516 2.42e-09 ***
## Wind         -3.33359    0.65441  -5.094 1.52e-06 ***
## Solar.R       0.05982    0.02319   2.580  0.01124 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.18 on 107 degrees of freedom
##   (42 observations deleted due to missingness)
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.5948 
## F-statistic: 54.83 on 3 and 107 DF,  p-value: < 2.2e-16

Let us plot the results

par(mfrow=c(1,3))
termplot(airq.lm,se=TRUE)
*Plots of `lm` fit*

Plots of lm fit

We can also plot the residuals of the model in order to check the model hypothesis

par(mfrow=c(1,2))
plot(airq.lm,which=1:2)
*Plots of `lm` residuals*

Plots of lm residuals

The lack of normality in the residuals is due to the asymmetry of the response variable Ozone, we can apply a log transform to achieve asymmetry, i.e.:

par(mfrow=c(1,2))
hist(airquality$Ozone, main="Ozone")
hist(log(airquality$Ozone), main ="log(Ozone)")
*histograms of `Ozone` and `log(Ozone)`*

histograms of Ozone and log(Ozone)

We can now fit a linear model of the \(\log(Ozone)\), does the model look more adecuate?

lairq.lm <- lm(log(Ozone)~ Temp + Wind + Solar.R, data=airquality)
summary(lairq.lm)
plot(lairq.lm)

Fitting a linear model we can conclude that there might be some heterokedasticity not accounted by the model. However, the most possible cause is that the relationship between the response variable and the covariates are far from linear.

Let us fit a GAM model, firstly with a single variable, i.e,:

library(mgcv)
## Loading required package: nlme
## This is mgcv 1.8-24. For overview type 'help("mgcv-package")'.
airq.gam1 <- gam(log(Ozone) ~ s(Wind,bs="ps",m=2,k=10), 
                 method="REML", select=TRUE,data=airquality)
summary(airq.gam1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## log(Ozone) ~ s(Wind, bs = "ps", m = 2, k = 10)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.4185     0.0655   52.19   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F  p-value    
## s(Wind) 2.565      9 6.453 2.76e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.336   Deviance explained =   35%
## -REML = 128.69  Scale est. = 0.49769   n = 116

The results show that the smooth effect s(Wind) is significative (with an approximate \(p\)-value close to zero and edf \(2.56\)). The next Figure shows the smooth wind effect. The increment of the wind speed decreases the \(\log\) Ozone levels (dramatically until 10mph). Note that including the intercept, the smooth effects are centered at zero.

plot(airq.gam1,residuals=TRUE,scheme=1)
*Estimated smooth effect of `Wind` on `log(Ozone)`. Data points are the residuals*

Estimated smooth effect of Wind on log(Ozone). Data points are the residuals

gam.check(airq.gam1)
*Check plots by `gam.check`*

Check plots by gam.check

## 
## Method: REML   Optimizer: outer newton
## full convergence after 8 iterations.
## Gradient range [-1.455961e-06,1.183099e-06]
## (score 128.6926 & scale 0.4976891).
## Hessian positive definite, eigenvalue range [0.4379927,57.51548].
## Model rank =  10 / 10 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##           k'  edf k-index p-value    
## s(Wind) 9.00 2.56    0.69  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that, in model airq.gam1 we used k=10 knots, the variable Wind has \(31\) unique values, and hence 10 knots should be enough. We fit a model for the residuals of the fitted gam model

resids <- residuals(airq.gam1)
resids.gam <- gam(resids~s(Wind,k=20,m=2),method="REML",
                  select=TRUE,data=airq.gam1$model)
summary(resids.gam)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## resids ~ s(Wind, k = 20, m = 2)
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.137e-15  6.477e-02       0        1
## 
## Approximate significance of smooth terms:
##               edf Ref.df F p-value
## s(Wind) 7.419e-05     19 0       1
## 
## R-sq.(adj) =  -5.66e-07   Deviance explained = 7.89e-06%
## -REML = 124.14  Scale est. = 0.48659   n = 116

The results show that there is no unexplained variability between the variable and the residuals. Hence, we can conclude that we do not need more knots. The next Figure supports this statement.

plot(resids.gam)
*Wind effect vs the residuals of `airq.gam1`*

Wind effect vs the residuals of airq.gam1

airq.pred <- data.frame(Wind=seq(min(airquality$Wind),max(airquality$Wind)),
                        length.out=200)
p <- predict(airq.gam1, newdata = airq.pred, type="response", se.fit = TRUE)

plot(airq.pred$Wind,p$fit, xlab="Wind", ylab="log(Ozone)", 
     type="l", ylim=c(0,6))
lines(airq.pred$Wind,p$fit + 1.96 * p$se.fit, lty=2)
lines(airq.pred$Wind,p$fit - 1.96 * p$se.fit, lty=2)
points(airquality$Wind,log(airquality$Ozone),cex=.55,col="grey", pch=15)
*Predicted curve and CI's*

Predicted curve and CI's

Now we add the variable Temp:

airq.gam2 <- gam(log(Ozone)~s(Wind,bs="ps",m=2,k=10)+s(Temp,bs="ps",m=2,k=10),
                 method="REML",select=TRUE,data=airquality)
summary(airq.gam2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## log(Ozone) ~ s(Wind, bs = "ps", m = 2, k = 10) + s(Temp, bs = "ps", 
##     m = 2, k = 10)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.41852    0.04963   68.89   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df      F  p-value    
## s(Wind) 2.068      9  1.353 0.000981 ***
## s(Temp) 3.990      9 10.496  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.619   Deviance explained = 63.9%
## -REML = 101.64  Scale est. = 0.28568   n = 116

Hence, the Temp effect is significative. The next Figure show both smooth effects.

par(mfrow=c(1,2))
plot(airq.gam2,residuals=TRUE,scheme=1)
*Estimated smooth effect of `Wind` and `Temp` on `log(Ozone)`. Data points are the residuals*

Estimated smooth effect of Wind and Temp on log(Ozone). Data points are the residuals

We can compare both models using the function anova for a \(F\)-test:

anova(airq.gam1,airq.gam2)
## Analysis of Deviance Table
## 
## Model 1: log(Ozone) ~ s(Wind, bs = "ps", m = 2, k = 10)
## Model 2: log(Ozone) ~ s(Wind, bs = "ps", m = 2, k = 10) + s(Temp, bs = "ps", 
##     m = 2, k = 10)
##   Resid. Df Resid. Dev     Df Deviance
## 1    111.16     55.958                
## 2    105.98     31.123 5.1832   24.835

We can conclude that Temp variable is relevant. Note that, strictly, in this case both models are not nested, as the effective dimension of the variable Wind is different when the variable Temp is present. Hence, we use the AIC to confirm the results.

AIC(airq.gam1)
## [1] 255.0315
AIC(airq.gam2)
## [1] 195.6561

Now, we include the covariate Solar.R. Notice that this variables contains missing values (NA's).

sum(is.na(airquality$Solar.R))
## [1] 7

In order to compare the previous model airq.gam2 and the new model that includes Solar.R we will use the same data, so we will omit the missing values and refit the models.

new.airquality <- na.omit(airquality)

airq.gam22=gam(log(Ozone)~s(Wind,bs="ps",m=2,k=10)+s(Temp,bs="ps",m=2,k=10),
               method="REML",select=TRUE,data=new.airquality)

airq.gam3=gam(log(Ozone)~s(Wind,bs="ps",m=2,k=10)+s(Temp,bs="ps",m=2,k=10)+s(Solar.R,bs="ps",m=2,k=20),
              method="REML",select=TRUE,data=new.airquality)

summary(airq.gam22)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## log(Ozone) ~ s(Wind, bs = "ps", m = 2, k = 10) + s(Temp, bs = "ps", 
##     m = 2, k = 10)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.41593    0.05128   66.61   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df     F  p-value    
## s(Wind) 2.1879      9 1.919    1e-04 ***
## s(Temp) 0.9874      9 8.601 7.73e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.611   Deviance explained = 62.2%
## -REML = 95.215  Scale est. = 0.29194   n = 111
summary(airq.gam3)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## log(Ozone) ~ s(Wind, bs = "ps", m = 2, k = 10) + s(Temp, bs = "ps", 
##     m = 2, k = 10) + s(Solar.R, bs = "ps", m = 2, k = 20)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.41593    0.04586   74.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##              edf Ref.df     F  p-value    
## s(Wind)    2.318      9 2.255 2.44e-05 ***
## s(Temp)    1.852      9 6.128 1.12e-12 ***
## s(Solar.R) 2.145     19 1.397 1.23e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.689   Deviance explained = 70.7%
## -REML = 86.106  Scale est. = 0.23342   n = 111
AIC(airq.gam22)
## [1] 185.7769
AIC(airq.gam3)
## [1] 166.0423

Is the Solar.R variable relevant?

The next Figure shows the estimated smooth effects for Wind, Temp and Solar.R covariables (partial residuals are also plotted).

par(mfrow=c(2,2))
plot(airq.gam3,residuals=TRUE)
*Smooth effects of `Wind`, `Temp` and `Solar.R`*

Smooth effects of Wind, Temp and Solar.R

Example: Soy bean data

Data from an experiment to compare growth patterns of two genotypes of soybeans: Plant Introduction #416937 (P), an experimental strain, and Forrest (F), a commercial variety.

library(nlme)
library(lattice)
data(Soybean)
?Soybean
Soy <- Soybean 
Soy$logweight <- log(Soy$weight)
xyplot(logweight ~ Time, Soy, groups = Plot, type = c('g','l','b')) # Spaghetti plot

Questions

  1. Fit a model for logweight based on Variety and Year
g1 <- lme(logweight ~ Year + Variety + Time + I(Time^2),random = list(Plot = ~ 1 + Time), data = Soy, method = "REML") 
# or g1 <- lme(logweight ~ Year + Variety + Time + I(Time^2),random =  ~ 1 + Time|Plot, data = Soy, method = "REML") 
summary(g1)
## Linear mixed-effects model fit by REML
##  Data: Soy 
##         AIC      BIC   logLik
##   -17.19048 22.87305 18.59524
## 
## Random effects:
##  Formula: ~1 + Time | Plot
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev      Corr  
## (Intercept) 0.214970174 (Intr)
## Time        0.003411878 -0.81 
## Residual    0.190853283       
## 
## Fixed effects: logweight ~ Year + Variety + Time + I(Time^2) 
##                 Value  Std.Error  DF   t-value p-value
## (Intercept) -4.996188 0.06724163 362 -74.30201  0.0000
## Year1989    -0.399367 0.05050960  44  -7.90675  0.0000
## Year1990     0.043706 0.05041932  44   0.86685  0.3907
## VarietyP     0.330378 0.04129027  44   8.00136  0.0000
## Time         0.201914 0.00230711 362  87.51813  0.0000
## I(Time^2)   -0.001308 0.00002352 362 -55.61852  0.0000
##  Correlation: 
##           (Intr) Yr1989 Yr1990 VartyP Time  
## Year1989  -0.418                            
## Year1990  -0.382  0.488                     
## VarietyP  -0.308  0.003  0.003              
## Time      -0.745  0.075  0.015  0.000       
## I(Time^2)  0.634 -0.076 -0.010 -0.001 -0.957
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.17863140 -0.59460576  0.01326769  0.57401455  3.14914073 
## 
## Number of Observations: 412
## Number of Groups: 48
xyplot(fitted(g1) ~ Time, Soy, groups = Plot, type = c('g','l','b')) # Spaghetti plot

Pig weights data

library(SemiPar)
data(pig.weights)
library(lattice)
xyplot(weight~num.weeks,data=pig.weights,groups=id.num,type=c("g","l"))

head(pig.weights)
##   id.num num.weeks weight
## 1      1         1   24.0
## 2      1         2   32.0
## 3      1         3   39.0
## 4      1         4   42.5
## 5      1         5   48.0
## 6      1         6   54.5

Questions

  1. Fit longitudinal model to explain the growth of the pigs. Use lme or lmer

Example: Titanic survivors data

The dataset is a collection of data about some of the passengers, and the goal is to predict the survival (either 1 if the passenger survived or 0 if they did not) based on some features such as the class of service, the sex, the age etc. As you can see, we are going to use both categorical and continuous variables.

VARIABLE DESCRIPTIONS:
pclass          Passenger Class
                (1 = 1st; 2 = 2nd; 3 = 3rd)
survival        Survival
                (0 = No; 1 = Yes)
name            Name
sex             Sex
age             Age
sibsp           Number of Siblings/Spouses Aboard
parch           Number of Parents/Children Aboard
ticket          Ticket Number
fare            Passenger Fare
cabin           Cabin
embarked        Port of Embarkation
                (C = Cherbourg; Q = Queenstown; S = Southampton)
boat            Lifeboat
body            Body Identification Number
home_dest       Home/Destination

Full description of data set

Download data here

Read train and test set

train <- read.csv('data/titanic_train.csv',header=TRUE,row.names=1)
 test <- read.csv('data/titanic_test.csv',header=TRUE,row.names=1)

Questions:

Example: Mackerel data from a Spanish survey

These data were recorded by a Spanish survey, as part of a multi-country survey of the abundance of mackerel eggs off the coast of north-western Europe, in 1992.

library(gamair)
library(sm)
library(mgcv)
library(fields)
library(maps)

data(mackerel)
data(mackp)
attach(mackerel)
Latitude=mack.lat
Longitude=-mack.long

# plot the egg densities against location
plot(Longitude,Latitude,cex=Density/150,col=2,asp=.85)
map("world",add=TRUE,fill=TRUE,col="lightgrey")
*Mackerel eggs abundance*

Mackerel eggs abundance

Fit a gam for the spatial locations

m0<-gam(log(Density)~s(Longitude,Latitude,k=30))
# vis.gam(m0,plot.type="contour",color="terrain")
# map("world",add=TRUE,fill=TRUE,col="grey")
summary(m0)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## log(Density) ~ s(Longitude, Latitude, k = 30)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.22409    0.06126   52.63   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                         edf Ref.df     F p-value    
## s(Longitude,Latitude) 24.24  27.67 15.63  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.606   Deviance explained = 64.1%
## GCV = 1.1512  Scale est. = 1.0471    n = 279
*Smooth spatial trend*

Smooth spatial trend

Now we include depth, Temperature and Salinity

ldepth <- log(mack.depth) # logarithm scale
m1<-gam(log(Density)~s(Longitude,Latitude)+s(Temperature)+s(Salinity)+s(ldepth))
par(mfrow=c(2,2))
plot(m1,scheme=2)
Additive model `gam` fit

Additive model gam fit

summary(m1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## log(Density) ~ s(Longitude, Latitude) + s(Temperature) + s(Salinity) + 
##     s(ldepth)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.22409    0.05581   57.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                          edf Ref.df      F  p-value    
## s(Longitude,Latitude) 20.117 24.731  4.459 1.24e-10 ***
## s(Temperature)         2.324  2.914  3.857   0.0147 *  
## s(Salinity)            1.000  1.000  0.379   0.5388    
## s(ldepth)              2.791  3.529 14.166 3.11e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.673   Deviance explained = 70.4%
## GCV = 0.96301  Scale est. = 0.86901   n = 279

Check residuals

gam.check(m1)
*`gam.check` results*

gam.check results

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 11 iterations.
## The RMS GCV score gradient at convergence was 1.094165e-07 .
## The Hessian was positive definite.
## Model rank =  57 / 57 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                          k'   edf k-index p-value  
## s(Longitude,Latitude) 29.00 20.12    0.91   0.015 *
## s(Temperature)         9.00  2.32    1.03   0.660  
## s(Salinity)            9.00  1.00    0.96   0.170  
## s(ldepth)              9.00  2.79    1.03   0.700  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Remove Salinity

m2<-gam(log(Density)~s(Longitude,Latitude)+s(Temperature)+s(ldepth))
par(mfrow=c(2,2))
plot(m2,scheme=2,1)
*Smooth term of `m2` model*

Smooth term of m2 model

Mackerel data from a Spanish Survery These data were recorded by a Spanish survey, as part of a multi-country survey of the abundance of mackerel eggs off the coast of north-western Europe, in 1992. This data exhibit rather different features from the remainder of the survey. One of these features is that no eggs were detected at all at a substantial number of the sampling points. This is due to the smaller nets and the need to compensate by taking a larger number of smaller volume samples. The sampling locations are shown in the next Figure.

## The following objects are masked from mackerel:
## 
##     Density, Temperature
*Sampling positions with presence and absence of eggs*

Sampling positions with presence and absence of eggs

We consider a logistic model for the presence of eggs using as the \(\log\) of the depth as covariate

library(mgcv)
logit.gam <- gam(Presence ~ s(ldepth),family=binomial)
plot(logit.gam)
*logistic regression estimate of the relationship between presence and log of depth.*

logistic regression estimate of the relationship between presence and log of depth.

logit1 <- gam(Presence~s(Longitude,Latitude)+
                s(ldepth)+s(Temperature),family=binomial)
summary(logit1)
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## Presence ~ s(Longitude, Latitude) + s(ldepth) + s(Temperature)
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.6816     0.7591   -4.85 1.23e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                          edf Ref.df Chi.sq  p-value    
## s(Longitude,Latitude) 28.738  28.97 67.083 8.38e-05 ***
## s(ldepth)              1.000   1.00  3.227   0.0724 .  
## s(Temperature)         2.839   3.63 11.264   0.0234 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.322   Deviance explained = 34.2%
## UBRE = -0.085903  Scale est. = 1         n = 417
logit2 <- gam(Presence~s(Longitude,Latitude)+
                s(ldepth),family=binomial)
summary(logit2)
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## Presence ~ s(Longitude, Latitude) + s(ldepth)
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.9601     0.8548  -4.633  3.6e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                          edf Ref.df Chi.sq  p-value    
## s(Longitude,Latitude) 28.822 28.982  65.65 0.000112 ***
## s(ldepth)              6.326  7.544  10.20 0.279097    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.299   Deviance explained = 33.8%
## UBRE = -0.068872  Scale est. = 1         n = 417
logit3 <- gam(Presence~s(Longitude,Latitude)+
                s(Temperature),family=binomial)
summary(logit3)
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## Presence ~ s(Longitude, Latitude) + s(Temperature)
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.5245     0.7247  -4.863 1.15e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                          edf Ref.df Chi.sq  p-value    
## s(Longitude,Latitude) 28.711 28.968 73.830 1.03e-05 ***
## s(Temperature)         2.596  3.339  8.205   0.0647 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.313   Deviance explained = 33.2%
## UBRE = -0.08136  Scale est. = 1         n = 417